307 research outputs found

    Stacked Penalized Logistic Regression for Selecting Views in Multi-View Learning

    Full text link
    In biomedical research, many different types of patient data can be collected, such as various types of omics data and medical imaging modalities. Applying multi-view learning to these different sources of information can increase the accuracy of medical classification models compared with single-view procedures. However, collecting biomedical data can be expensive and/or burdening for patients, so that it is important to reduce the amount of required data collection. It is therefore necessary to develop multi-view learning methods which can accurately identify those views that are most important for prediction. In recent years, several biomedical studies have used an approach known as multi-view stacking (MVS), where a model is trained on each view separately and the resulting predictions are combined through stacking. In these studies, MVS has been shown to increase classification accuracy. However, the MVS framework can also be used for selecting a subset of important views. To study the view selection potential of MVS, we develop a special case called stacked penalized logistic regression (StaPLR). Compared with existing view-selection methods, StaPLR can make use of faster optimization algorithms and is easily parallelized. We show that nonnegativity constraints on the parameters of the function which combines the views play an important role in preventing unimportant views from entering the model. We investigate the performance of StaPLR through simulations, and consider two real data examples. We compare the performance of StaPLR with an existing view selection method called the group lasso and observe that, in terms of view selection, StaPLR is often more conservative and has a consistently lower false positive rate.Comment: 26 pages, 9 figures. Accepted manuscrip

    Research Openness in Canadian Political Science: Toward an Inclusive and Differentiated Discussion

    Get PDF
    In this paper, we initiate a discussion within the Canadian political science community about research openness and its implications for our discipline.  This discussion is important because the Tri-Agency has recently released guidelines on data management and because a number of political science journals, from several subfields, have signed the Journal Editors’ Transparency Statement requiring data access and research transparency (DA-RT).  As norms regarding research openness develop, an increasing number and range of journals and funding agencies may begin to implement DA-RT-type requirements.  If Canadian political scientists wish to continue to participate in the global political science community, we must take careful note of and be proactive participants in the ongoing developments concerning research openness

    The Bradley–Terry Regression Trunk approach for Modeling Preference Data with Small Trees

    Get PDF
    This paper introduces the Bradley-Terry regression trunk model, a novel probabilistic approach for the analysis of preference data expressed through paired comparison rankings. In some cases, it may be reasonable to assume that the preferences expressed by individuals depend on their characteristics. Within the framework of tree-based partitioning, we specify a tree-based model estimating the joint effects of subject-specific covariates over and above their main effects. We, therefore, combine a tree-based model and the log-linear Bradley-Terry model using the outcome of the comparisons as response variable. The proposed model provides a solution to discover interaction effects when no a-priori hypotheses are available. It produces a small tree, called trunk, that represents a fair compromise between a simple interpretation of the interaction effects and an easy to read partition of judges based on their characteristics and the preferences they have expressed. We present an application on a real dataset following two different approaches, and a simulation study to test the model's performance. Simulations showed that the quality of the model performance increases when the number of rankings and objects increases. In addition, the performance is considerably amplified when the judges' characteristics have a high impact on their choices

    Continuous Sweep: an improved, binary quantifier

    Full text link
    Quantification is a supervised machine learning task, focused on estimating the class prevalence of a dataset rather than labeling its individual observations. We introduce Continuous Sweep, a new parametric binary quantifier inspired by the well-performing Median Sweep. Median Sweep is currently one of the best binary quantifiers, but we have changed this quantifier on three points, namely 1) using parametric class distributions instead of empirical distributions, 2) optimizing decision boundaries instead of applying discrete decision rules, and 3) calculating the mean instead of the median. We derive analytic expressions for the bias and variance of Continuous Sweep under general model assumptions. This is one of the first theoretical contributions in the field of quantification learning. Moreover, these derivations enable us to find the optimal decision boundaries. Finally, our simulation study shows that Continuous Sweep outperforms Median Sweep in a wide range of situations

    Analyzing hierarchical multi-view MRI data with StaPLR: An application to Alzheimer's disease classification

    Get PDF
    Multi-view data refers to a setting where features are divided into feature sets, for example because they correspond to different sources. Stacked penalized logistic regression (StaPLR) is a recently introduced method that can be used for classification and automatically selecting the views that are most important for prediction. We introduce an extension of this method to a setting where the data has a hierarchical multi-view structure. We also introduce a new view importance measure for StaPLR, which allows us to compare the importance of views at any level of the hierarchy. We apply our extended StaPLR algorithm to Alzheimer's disease classification where different MRI measures have been calculated from three scan types: structural MRI, diffusion-weighted MRI, and resting-state fMRI. StaPLR can identify which scan types and which derived MRI measures are most important for classification, and it outperforms elastic net regression in classification performance.Comment: 36 pages, 9 figures. Accepted manuscrip

    The detection and modeling of direct effects in latent class analysis

    Get PDF
    Several approaches have been proposed for latent class modeling with external variables, including one-step, two-step and three-step estimators. However, very little is known yet about the performance of these approaches when direct effects of the external variable to the indicators of latent class membership are present. In the current article, we compare those approaches and investigate the consequences of not modeling these direct effects when present, as well as the power of residual and fir statistics to identify such effects. The results of the simulations show that not modeling direct effect can lead to severe parameter bias, especially with a weak measurement model. Both residual and fit statistics can be used to identify such effects, as long as the number and strength of these effects is low and the measurement model is sufficiently strong

    Call-duration and triage decisions in out of hours cooperatives with and without the use of an expert system

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cooperatives delivering out of hours care in the Netherlands are hesitant about the use of expert systems during triage. Apart from the extra costs, cooperatives are not sure that quality of triage is sufficiently enhanced by these systems and believe that call duration will be prolonged drastically. No figures about the influence of the use of an expert system during triage on call duration and triage decisions in out of hours care in the Netherlands are available.</p> <p>Methods</p> <p>Electronically registered data concerning call duration and triage decisions were collected in two cooperatives. One in Tilburg, a cooperative in a Southern city of the Netherlands using an expert system, and one in Groningen, a cooperative in a Northern city not using an expert system. Some other relevant information about the care process was collected additionally. Data about call duration was compared using an independent sample t-test. Data about call decisions was compared using Chi Square.</p> <p>Results</p> <p>The mean call time in the cooperative using the TAS expert system is 4.6 minutes, in the cooperative not using the expert system 3.9 minutes. A significant difference of 0.7 minutes (0.4 – 1.0, 95% CI) minutes. In the cooperative with an expert system a larger percentage of patients is handled by the assistant, patients are less often referred to a telephone consultation with the GP and are less likely to be offered a visit by the GP.</p> <p>A quick interpretation of the impact of the difference in triage decisions, show that these may be large enough to support the hypothesis that longer call duration is compensated for by less contacts with the GP (by telephone or face-to-face). There is no proof, however, that these differences are caused by the use of the triage system. The larger amount of calls handled by the assistant may be partly caused by the fact that the assistants in the cooperative with an expert system more often consult the GP during triage. And it is not likely that the larger amount of home visits in Groningen can be attributed to the absence of an expert system. The expert system only offers advice whether a GP should be seen, not in which way (by consultation in the office or by home visit).</p> <p>Conclusion</p> <p>The differences in call times between a cooperative using an expert system and a cooperative not using an expert system are small; 0.4 – 1.0 min. Differences in triage decisions were found, but it is not proven that these can be contributed to the use of an expert system.</p

    The Internet addiction components model and personality: Establishing construct validity via a nomological network

    Get PDF
    There is growing concern over excessive and sometimes problematic Internet use. Drawing upon the framework of the components model of addiction (Griffiths, 2005), Internet addiction appears as behavioural addiction characterised by the following symptoms: salience, withdrawal, tolerance, mood modification, relapse and conflict. A number of factors have been associated with an increased risk for Internet addiction, including personality traits. The overall aim of this study was to establish the association between personality traits and the Internet addiction components model in order to develop a theoretical framework via a nomological network. Internet addiction and personality traits were assessed in two independent samples of 3,105 adolescents in the Netherlands and 2,257 university students in England. The results indicate that low agreeableness and high neuroticism/low emotional stability are associated the Internet addiction components factor in both samples. However, low conscientiousness and low resourcefulness predicted it in the adolescent sample only. The implications include the usage of the Internet addiction components model as parsimonious tool for the initial screening of potential clients in mental health institutes, and identifying populations at risk through their personality traits which may prove advantageous for the initiation of targeted preventions efforts
    • 

    corecore